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Abstract 

The state-of-the-art of face recognition has been signifi¬ 
cantly advanced by the emergence of deep learning. Very 
deep neural networks recently achieved great success on 
general object recognition because of their superb learning 
capacity. This motivates us to investigate their effectiveness 
on face recognition. This paper proposes two very deep 
neural network architectures, referred to as DeepIDS, 
for face recognition. These two architectures are rebuilt 
from stacked convolution and inception layers proposed in 
VGG net and GoogLeNet mi to make them suitable 
to face recognition. Joint face identification-verification 
supervisory signals are added to both intermediate and 
final feature extraction layers during training. An ensemble 
of the proposed two architectures achieves 99.53% LFW 
face verification accuracy and 96.0% LFW rank-1 face 
identification accuracy, respectively. A further discussion 
of LFW face verification result is given in the end. 


1. Introduction 

Using deep neural networks to learn effective feature 
representations has become popular in face recognition 

With better 

deep network architectures and supervisory methods, face 
recognition accuracy has been boosted rapidly in recent 
years. In particular, a few noticeable face representation 
learning techniques are evolved recently. An early effort 
of learning deep face representation in a supervised way 
was to employ face verification as the supervisory signal 
113, which required classifying a pair of training images 
as being the same person or not. It greatly reduced the 
intra-personal variations in the face representation. Then 
learning discriminative deep face representation through 
large-scale face identity classification (face identification) 


was proposed by DeepID ifT^ and DeepFace |[l7l|T8|. By 
classifying training images into a large amount of identities, 
the last hidden layer of deep neural networks would form 
rich identity-related features. With this technique, deep 
learning got close to human performance for the first time 
on tightly cropped face images of the extensively evaluated 
LFW face verification dataset m However, the learned 
face representation could also contain significant intra¬ 
personal variations. Motivated by both ms and ifTHl . an 
approach of learning deep face representation by joint face 
identification-verification was proposed in DeepID2 ns 
and was further improved in DeepID2+ ca. Adding 
verification supervisory signals significantly reduced intra¬ 
personal variations, leading to another significant improve¬ 
ment on face recognition performance. Human face 
verification accuracy on the entire face images of LFW was 
surpassed finally (TsKTSl. Both GoogLeNet ifT^ and VGG 
ca ranked in the top in general image classification in 
ILSVRC 2014. This motivates us to investigate whether the 
superb learning capacity brought by very deep net structures 
can also benefit face recognition. 

Although supervised by advanced supervisory signals, 
the network architectures of DeepID2 and DeepID2-F are 
much shallower compared to recently proposed high- 
performance deep neural networks in general object recog¬ 
nition such as VGG and GoogLeNet. VGG net stacked 
multiple convolutional layers together to form complex 
features. GoogLeNet is more advanced by incorporating 
multi-scale convolutions and pooling into a single feature 
extraction layer coined inception 03. To learn efficiently, 
it also introduced 1x1 convolutions for feature dimension 
reduction. 

In this paper, we propose two deep neural network ar¬ 
chitectures, referred to as DeepIDS, which are significantly 
deeper than the previous state-of-the-art DeepID2-F archi¬ 
tecture for face recognition. DeepIDS networks are rebuilt 
from basic elements (i.e., stacked convolution or inception 
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layers) of VGG net Col and GoogLeNet O. During 
training, joint face identification-verification supervisory 
signals 03 are added to the final feature extraction layer 
as well as a few intermediate layers of each network. In 
addition, to learn a richer pool of facial features, weights in 
higher layers of some of DeepIDS networks are unshared. 
Being trained on the same dataset as DeepID2-F, DeepIDS 
improves the face verification accuracy from 99.47% to 
99.53% and rank-1 face identification accuracy from 95.0% 
to 96.0% on LFW, compared with DeepID2-F. The ’’true” 
face verification accuracy when wrongly labeled face pairs 
are corrected and a few hard test samples will be further 
discussed in the end. 

2. DeepID3 net 

For the comparison purpose, we briefiy review the 
previously proposed DeepID2-F net architecture 113. As 
illustrated in Fig. DeepID2+ net has three convolutional 
layers followed by max-pooling (neurons in the third convo¬ 
lutional layer share weights in only local regions), followed 
by one locally-connected layer and one fully-connected 
layer. Joint identification-verification supervisory signals 
03 are added to the last fully-connected layer (from which 
the final features are extracted for face recognition) as 
well as a few fully connected layers branched out from 
intermediate pooling layers to better supervise early feature 
extraction processes. 

The proposed DeepIDS net inherits a few characteristics 
of the DeepID2-F net, including unshared neural weights 
in the last few feature extraction layers and the way of 
adding supervisory signals to early layers. However, the 
DeepIDS net is significantly deeper, with ten to fifteen 
non-linear feature extraction layers, compared to five in 
DeepID2-F. In particular, we propose two DeepIDS net 
architectures, referred to as DeepIDS netl and DeepIDS 
net2, as illustrated in Fig. and Fig. respectively. 
The depth of DeepIDS net is due to stacking multiple 
convolution/inception layers before each pooling layer. 
Continuous convolution/inception helps to form features 
with larger receptive fields and more complex nonlinearity 
while restricting the number of parameters Col. 

The proposed DeepIDS netl takes two continuous con¬ 
volutional layers before each pooling layer. Compared to 
the VGG net proposed in previous literature (TOl El, we 
add additional supervisory signals in a number of full- 
connection layers branched out from intermediate layers, 
which helps to learn better mid-level features and makes 
optimization of a very deep neural network easier. The top 
two convolutional layers are replaced by locally connected 
layers. With unshared parameters, top layers could form 
more expressive features with a reduced feature dimension. 
The last locally connected layer of our DeepIDS netl is 
used to extract the final features without an additional fully 
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Figure 1: Architecture of DeepID2-i- net C3 Solid arrows 
show forward-propagation directions. Dashed arrows point 
the layers on which joint face identification-verification 
supervisory signals are added. The final feature extraction 
layer in red box is used for face recognition. 

connected layer. 

DeepIDS net2 starts with every two continuous con¬ 
volutional layers followed by one pooling layer as does 
in DeepIDS netl, while taking inception layers in 
later feature extraction stages: there are three continuous 
inception layers before the third pooling layer and two 
inception layers before the fourth pooling layer. Joint 
identification-verification supervisory signals are added on 
fully connected layers following each pooling layer. 

In the proposed two network architectures, rectified 
linear non-linearity O is used for all except pooling layers, 
and dropout learning lO is added on the final feature 
extraction layer. Although with significant depth, our 
DeepIDS networks are much smaller than VGG net or 
GoogLeNet proposed in general object recognition due to 
a restricted number of feature maps in each layer. 

The proposed DeepIDS nets are trained on the same 25 
face regions as DeepID2-F nets C3, with each network 
taking a particular face region as input. These face regions 
are selected by feature selection in the previous work 
C3, which differ in positions, scales, and color channels 
such that different networks could learn complementary 
information. After training, these networks are used to 
extract features from respective face regions. Then an 
additional Joint Bayesian model Il3 is learned on these 
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DeeplD3 netl 


Figure 2: Architecture of DeepIDS netl. Figure description 
is the same as Fig. 

features for face verification or identification. All the 
DeepIDS networks and Joint Bayesian models are learned 
on the same approximately 300 thousand training samples 
as used in DeepID2+ na, which is a combination of 
CelebFaces+ d and WDRef O) datasets, and tested 
on LFW la. People in these two training data sets 
and the LFW test set are mutually exclusive. The face 
verification performance on LFW of individual DeepIDS 
net is compared to DeepID2+ net in Fig. |^on the 25 face 
regions (with horizontal flipping), respectively. On average, 
DeepIDS netl and DeepIDS net2 reduce the error rate by 
0.81% and 0.26% compared to DeepID2+ net, respectively. 

3. Experiments 

To reduce redundancy, DeepIDS netl and net2 are used 
to extract features on either the original or the horizontally 
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Figure 3: Architecture of DeepIDS net2. Figure description 
is the same as Fig. 



Figure 4: LFW face verification accuracy of individual 
DeepID2+ and DeepIDS net trained on the same face 
regions in d 


3 


































































































































Table 1: Face verification on LFW. 


method 

accuracy (%) 

High-dim LBP 01 

95.17 ±1.13 

TL Joint Bayesian (2] 

96.33 ± 1.08 

DeepFace ITTl 

97.35 ±0.25 

DeepID 04) 

97.45 ±0.26 

GaussianFace QlSl 

98.52 ±0.66 

DeepID2 Ifnirm 

99.15 ±0.13 

DeepID2H- (Bl 

99.47 ±0.12 

DeepIDS 

99.53 ±0.10 



flipped face region but not both. In test, feature extraction 
takes 50 times of forward propagation with half from 
DeepIDS netl and the other half from net2. These features 
are concatenated into a long feature vector of approximately 
30,000 dimensions. With PCA, it is reduced to 300 
dimensions on which a Joint Bayesian model is learned for 
face recognition. 

We evaluate DeepIDS networks under the LFW face ver¬ 
ification in and LFW face identification d [TSl protocols, 
respectively. For face verification, 6000 given face pairs 
are verified to tell if they are from the same person. We 
achieve a mean accuracy of 99.53% under this protocol. 
Comparisons with previous works on mean accuracy and 
ROC curves are shown in Tab. [^and Fig. respectively. 

For face identification, we take one closed-set and one 
open-set identification protocols. For closed-set identifica¬ 
tion, the gallery set contains 4249 subjects with a single 
face image per subject, and the probe set contains 3143 
face images from the same set of subjects in the gallery. 
For open-set identification, the gallery set contains 596 
subjects with a single face image per subject, and the probe 
set contains 596 genuine probes and 9494 imposter ones. 
Table [^compares Rank-1 identification accuracy of closed- 
set identification and Rank-1 Detection and Identification 
rate (DIR) at a 1% False Alarm Rate (FAR) of open-set 
identification, respectively. We achieve 96.0% closed- 
set and 81.4% open-set face identification accuracies, 
respectively. 

4. Discussion 

There are three test face pairs which are labeled as 
the same person but are actually different people as 
announced on the LFW website. Among these three 
pairs, two are classified as the same person while the 
other one is classified as different people by our DeepIDS 
algorithm. Therefore, when the label of these three face 
pairs are corrected, the actual face verification accuracy 
of DeepIDS is 99.52%. For DeepID2-F (Bl, its face 


Figure 5: ROC of face verification on LFW. 

Table 2: Closed- and open-set identification tasks on LFW. 


method 

Rank-1 (%) 

DIR @ 1% 
FAR (%) 

COTS-sI ID 

56.7 

25 

COTS-sI-i-sd ID 

66.5 

35 

DeepFace ifTTl 

64.9 

44.5 

WST Fusion 03 

82.5 

61.9 

DeepID2-i- 051 

95.0 

80.7 

DeepIDS 

96.0 

81.4 


verification accuracy before correcting the three wrong 
labels is 99.47%. However, DeepID2-F classified all the 
three wrongly labeled positive face pairs as different people. 
When these three wrong labels are corrected, the true face 
verification accuracy of DeepID2-F is also 99.52% ifTSl . 
DeepIDS, although taking similar very deep architectures 
as VGG and GoogLeNet, does not improve over DeepID2-H, 
with significantly shallower architecture, on the LFW face 
verification task. Whether those very deep architectures 
would take advantage of more training face data and finally 
surpass shallower architectures like DeepID2-i- remains an 
open question. 

We examine the test face pairs in LFW which are 
wrongly classified by all the DeepID series algorithms 
including DeepID 031, DeepID2 fl^fm .DeepID2+ 05), 
and DeepIDS. There are nine common false positives and 
three common false negatives in total, around half of all 
wrongly classified face pairs by DeepIDS. The three face 
pairs labeled as the same person but being classified as 
different people are shown in Fig. The first pair of faces 
show great contrast of ages. The second pair is actually 
different people due to errors in labeling. The third one 


4 





































is an actress with significantly different makeups. Fig. 
shows the nine face pairs labeled as different people while 
being classified as the same person by algorithms. Most of 
them look similar or have interference such as occlusions. 



Figure 6: Common false negatives in DeepID series 
algorithms. 



Figure 7: Common false positives in DeepID series 
algorithms. 


5. Conclusion 

This paper proposes two significantly deeper neural 
network architectures, coined DeepIDS, for face recog¬ 
nition. The proposed DeepIDS networks achieve the 
state-of-the-art performance on both LFW face verification 
and identification tasks. However, when a few wrong 
labels in LFW are corrected, the improvement of DeepIDS 
over DeepID2-i- on LFW face verification vanished. The 
effectiveness of very deep neural networks would be further 
investigated on larger scale training data in the future. 
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